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We present a novel statistical mechanics formalism for the theoretical description 
of the process of protein foldingounfolding transition in water environment. The 
formalism is based on the construction of the partition function of a protein obeying 
two-stage-like folding kinetics. Using the statistical mechanics model of solvation 



i-G \ of hydrophobic hydrocarbons we obtain the partition function of infinitely diluted 

solution of proteins in water environment. The calculated dependencies of the protein 

heat capacities upon temperature are compared with the corresponding results of 

experimental measurements for staphylococcal nuclease and metmyoglobin. 
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I. INTRODUCTION 

Proteins are biological polymers consisting of elementary structural units, amino acids. 
Being synthesized at ribosome, proteins are exposed to the cell interior where they fold 
into their unique three dimensional structure. The process of forming the protein's three 
dimensional structure is called the process of protein folding. The correct folding of protein is 
of crucial importance for the protein's proper functioning. Despite numerous works devoted 
to investigation of protein folding this process is still not entirely understood. The current 
state-of-the-art in experimental and theoretical studies of the protein folding process are 
described in recent reviews and references therein 

In this paper we develop a novel theoretical method for the description of the protein fold- 
ing process which is based on the statistical mechanics principles. Considering the process 
of protein folding as a first order phase transition in a finite system, we present a statistical 
mechanics model for treating the foldingounfolding phase transition in single-domain pro- 
teins. The suggested method is based on the theory developed for the helix -B-coil transition 



l5|. 



in polypeptides discussed in 



- |l7j . A way to construct a parameter-free partition function 



or a system experiencing a-helix-B-random coil phase transition in vacuo was studied in 



,8 



6|. In [8] we have calculated potential energy surfaces (PES) of polyalanines of different 
lengths with respect to their twisting degrees of freedom. This was done within the frame- 
work of classical molecular mechanics. The calculated PES were then used to construct a 
parameter-free partition function of a polypeptide and to derive various thermodynamical 
characteristics of alanine polypeptides as a function of temperature and polypeptide length. 

In this paper we construct the partition function of a protein in vacuo, which is the 
further generalization of the formalism developed in 9[ , accounting for folded, unfolded and 
prefolded states of the protein. This way of the construction of the partition function is 
consistent with nucleation-condensation scenario of protein folding, which is a very common 



scenario for globular proteins |18| and implies that at the early stage of protein folding the 
native-like hydrophobic nucleus of protein is formed, while at the later stages of the protein 
folding process all the rest of amino acids also attain the native-like conformation. 

For the correct description of the protein folding in water environment it is of primary 
importance to consider the interactions between the protein and the solvent molecules. The 
hydrophobic interactions are known to be the most important driving forces of protein folding 



19|. In the present work we present a way how one can construct the partition function of 
the protein accounting for the interactions with solvent, i.e. accounting for the hydrophobic 
effect. The most prominent feature of our approach is that it is developed for concrete 
systems in contrast to various generalized and toy-models of protein folding process. 

We treat the hydrophobic interactions in the system using the statistical mechanics for- 



malism developed in [20[ for the description of the thermodynamical properties of the sol- 
vation process of aliphatic and aromatic hydrocarbons in water. However, accounting solely 
for hydrophobic interactions is not sufficient for the proper description of the energetics 
of all conformational states of the protein and one has to take electrostatic interactions 
into account. In the present work the electrostatic interactions are treated within a similar 



framework as described in 21]. 



We have applied the developed statistical mechanics model of protein folding for two 
globular proteins, namely staphylococcal nuclease and metmyoglobin. These proteins have 
simple two-stage-like folding kinetics and demonstrate two folding<H-unfolding transitions, 



refereed as heat and cold denaturation 



22 



23J . The comparison of the results of the the- 



oretical model with that of the experimental measurements shows the applicability of the 
suggested formalism for an accurate description of various thermodynamical characteristics 
in the system, e.g. heat denaturation, cold denaturation, increase of the reminiscent heat 
capacity of the unfolded protein, etc. 

Our paper is organized as follows. In Sec. IIIAI we present the formalism for the construc- 
tion of the partition function of the protein in water environment and justify the assumptions 
made on the system's properties. In Section IHII we discuss the results obtained with our 
model for the description of folding-H-unfolding transition in staphylococcal nuclease and 
metmyoglobin. In Section IIVI we summarize the paper and suggest several ways for a fur- 
ther development of the theoretical formalism. 

II. THEORETICAL METHODS 

A. Partition function of a protein 

To study thermodynamic properties of the system one needs to investigate its potential 
energy surface with respect to all the degrees of freedom. For the description of macromolec- 



ular systems, such as proteins, efficient model approaches are necessary. 

The most relevant degrees of freedom in the protein folding process are the twisting 

n n fi n n nri n 

degrees of freedom along its backbone chain |6j, [7|, l9Hlll 114 . Il5l l24j . |25l |. These degrees of 



freedom are defined for each amino acid of the protein except for the boundary ones and 



are described by two dihedral angles (fi and ^ (for definition of </?, and ipi see e. 
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Q r 



.g. laUls 



115|). 



The degrees of freedom of a protein can be classified as stiff and soft ones. We call the 
degrees of freedom corresponding to the variation of bond lengths, angles and improper 
dihedral angles as stiff, while degrees of freedom corresponding to the angles (fi and ipi are 
soft degrees of freedom |6j . The stiff degrees of freedom can be treated within the harmonic 
approximation, because the energies needed for a noticeable structural rearrangement with 
respect to these degrees of freedom are about several eV, which is significantly larger than 
the characteristic thermal energy of the system (kT), being at room temperature equal to 



0.026 eV 



13 
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A Hamiltonian of a protein is constructed as a sum of the potential, kinetic and vibrational 
energy terms. Assuming the harmonic approximation for the stiff degrees of freedom it is 
possible to derive the following expression for the partition function of a protein in vacuo 
being in a particular conformational state j |6| : 

Z j = A 3 {kTf N ~^ [ ... / e-^«^»/ fcT d^ 1 ...d^d^...d^ n , (1) 

where T is the temperature, k is the Boltzmann constant, N is the total number of atoms 
in the protein, l s is the number of soft degrees of freedom, A,- is defined as follows: 



-4, 



V 3 ■ M3/2 ■ J/f/f /f It* 
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s(j) 



(27r)f7r^n-=i 



3N-6-l 3 



U) 



U) 



(2) 



Aj is a factor which depends on the mass of the protein M, its three main momenta of inertia 
I ) ' , specific volume Vj, the frequencies of the stiff normal vibrational modes u>\ and on 
the generalized masses ^ corresponding to the soft degrees of freedom 6|. e, in Eq. ([T]) 
describes the potential energy of the system corresponding to the variation of soft degrees 
of freedom. Integration in Eq. (CQ) is performed over a certain part of a phase space of the 
system (a subspace E,) corresponding to the soft degrees of freedom <p and ip. The form of 
the partition function in Eq. ([1]) allows one to avoid the multidimensional integration over 



the whole coordinate space and to reduce the integration only to the relevant parts of the 
phase space. €j in Eq. (JT]) denotes the potential energy surface of the protein as a function of 
twisting degrees of freedom in the vicinity of protein's conformational state j. Note that in 
general the proper choice of all the relevant conformations of protein and the corresponding 
set of Tj is not a trivial task. 

One can expect that the factors Aj in Eq. ([1]) depend on the chosen conformation of 
the protein. However, due to the fact that the values of specific volumes, momenta of 
inertia and frequencies of normal vibration modes of the system in different conformations 
are expected to be close [9|, |29[ , the values of Aj in all conformations become nearly equal, 
at least in the zero order harmonic approximation, i.e. Aj = A. Another simplification 
of the integration in Eq. (TjQ) comes from the statistical independence of amino acids. We 
assume that within each conformational state j all amino acids can be treated statistically 
independently, i.e. the particular conformational state of i-th amino acid characterized by 
angles <fi G Tj and ipi G Tj does not influence the potential energy surface of all other 
amino acids, and vice versa. This assumption is well applicable for rigid conformational 
states of the protein such as native state. For the native state of a protein all atoms of the 
molecule move in harmonic potential in the vicinity of their equilibrium positions. However, 
in unfolded states of the protein the flexibility of the backbone chain leads to significant 
variations of the distances between atoms, and consequently to a significant variation of 
interactions between atoms. Accurate accounting (both analytical and computational) for 
the interactions between distant atoms in the unfolded state of a protein is extremely difficult 



(see Ref. [3CJ for analytical treatment of interactions in unfolded states of a protein). In this 
work we assume that all amino acids in unfolded state of a protein move in the identical 
mean field created by all the amino acids and leave the corrections to this approximation 
for further considerations. 

With the above mentioned assumptions the partition function of a protein Z p (without 
any solvent) reads as: 

Zp = a . (kTr-z- i f J2 n £ £ ex P M J W*A d(Pidrlfii (3) 

where the summation over j includes all £ statistically relevant conformations of the protein, 
a is the number of amino acids in the protein and ef is the potential energy surface as a 
function of twisting degrees of freedom <£>j and ipi of the z-th amino acid in the j-th conforma- 
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tional state of the protein. The exact construction of ei yfi, ipi) for various conformational 
states of a particular protein will be discussed below. We consider the angles <p and ip as 
the only two soft degrees of freedom in each amino acid of the protein, and therefore the 
total number of soft degrees of freedom of the protein l s = la. 

Partition function in Eq. ([3]) can be further simplified if one assumes (i) that each amino 
acid in the protein can exist only in two conformations: the native state conformation and 
the random coil conformation; (ii) the potential energy surfaces for all the amino acids are 
identical. This assumption is applicable for both the native and the random coil state. It is 
not very accurate for the description of thermodynamical properties of single amino acids, 
but is reasonable for the treatment of thermodynamical properties of the entire protein. 
The judgment of the quality of this assumption could be made on the basis of comparison 
of the results obtained with its use with experimental data. Such comparison is performed 
in Sec. IHII of this work. 

Amino acids in a protein being in its native state vibrate in a steep harmonic potential. 
Here we assume that the potential energy profile of an amino acid in the native conformation 
should not be very sensitive to the type of amino acid and thus can be taken as, e.g., the 
potential energy surface for an alanine amino acid in the a-helix conformation |8(. Using 
the same arguments the potential energy profile for an amino acid in unfolded protein 
state can be approximated by e.g. the potential of alanine in the unfolded state of alanine 
polypeptide (see Ref. [8| for discussion and analysis of alanine's potential energy surfaces). 
Indeed, for an unfolded state of a protein it is reasonable to expect that once neglecting the 
long-range interactions all the differences in the potential energy surfaces of various amino 
acids arise from the steric overlap of the amino acids's radicals. This is clearly seen on 
alanine's potential energy surface at values of ip > 0° presented in Ref. 8|. But the part of 
the potential energy surface at <p > 0° gives a minor contribution to the entropy of amino 
acid at room temperature. This fact allows one to neglect all the differences in potential 
energy surfaces for different amino acids in an unfolded protein, at least in the zero order 
approximation. This assumption should be especially justified for proteins with the rigid 
helix-rich native structure. The staphylococcal nuclease, which we study here has definitely 
high a- helix content. Another argument which allows to justify our assumption for a wider 
family of proteins is the rigidity of the protein's native structure. Below, we validate the 
assumptions made by performing the comparison of the results of our theoretical model with 



the experimental data for a//3 rich protein metmyoglobin obtained in [23 ] . 

For the description of the folding <H- unfolding transition in small globular proteins obey- 
ing simple two-state-like folding kinetics we assume that the protein can exist in one of 
three states: completely folded state, completely unfolded state and partially folded state 
where some amino acids from the flexible regions with no prominent secondary structure 
are in the unfolded state, while other amino acids are in the folded conformation. With this 
assumption the partition function of the protein reads as: 

^ + i;. (<-(.4)!(« -o- *- (4) 

where Zj is defined in Eq. (CQ), Zq is the partition function of the protein in completely 
unfolded state, a is the total number of amino acids in a protein and k is the number of 
amino acids in flexible regions. The factorial term in Eq. fl4]) accounts for the states in which 
various amino acids from flexible regions independently attain the native conformation. The 
summation in Eq. (Jlj) is performed over all partially folded states of the protein, where a — n 
is the minimal possible number of amino acids being in the folded state. The factorial term 
describes the number of ways to select i — (a — k) amino acids from the flexible region of 
the protein consisting of k amino acids attaining native-like conformation. 

Finally, the partition function of the protein in vacuo has the following form: 

Z P = Z P - A(kT) 3N - 3 ~ a , (5) 

where 

7 ™ , ^ K\ZlZ^exp(z.E /kT) 



i=a—K 

TV P7V 



(i - (a - «))!(a - i)\ 



Z b = J J exp(-^g^)d^ (7) 



TV J —TV 



z. = ££«p(-^)d^. (8) 

Here we omitted the trivial factor describing the motion of the protein center of mass, 
which is of no significance for the problem considered, eb(ip,ip) (b stands for bound) is 
the potential energy surface of an amino acid in the native conformation and e u (tp,ip) (u 
stands for unbound) is the potential energy surface of an amino acid in the random coil 
conformation. The potential energy profile of an amino acid is calculated as a function of 
its twisting degrees of freedom <p and ip. Let us denote by e° and e° u the global minima 



8 

on the potential energy surfaces of an amino acid in folded and in unfolded conformations 
respectively. The potential energy of an amino acid then reads as e° b + e u> b(<f,ip). E in 
Eq. ([6]) is defined as the energy difference between the global energy minima of the amino 
acid potential energy surfaces corresponding to the folded and unfolded conformations, i.e. 
Eq = e Q u — e®. The potential energy surfaces for amino acids as functions of angles ip and ip 



were calculated and thoroughly analyzed in 

In nature proteins perform their function in the aqueous environment. Therefore the 
correct theoretical description of the folding-H-unfolding transition in water environment 
should account for solvent effects. 

B. Partition function of a protein in water environment 

In this section we evaluate Eq and construct the partition function for the protein in 
water environment. 

The partition function of the infinitely diluted solution of proteins Z can be constructed 
as follows: 

Z = Y,^ )z w\ (9) 

where Z^J is the partition function of all water molecules in the j-th conformational state of 
a protein and Zp is the partition function of the protein in its j-th conformational state, in 
which we further omit the factor describing the contribution of stiff degrees of freedom in the 
system. This is done in order to simplify the expressions, because stiff degrees of freedom 
provide a constant contribution to the heat capacity of the system since the heat capacity 
of the ensemble of harmonic oscillators is constant. Below for the simplicity of notations we 
put Z p = Z p . 

There are two types of water molecules in the system: (i) molecules in pure water and 
(ii) molecules interacting with the protein. We assume that only the water molecules being 
in the vicinity of the protein's surface are involved in the folding<H-unfolding transition, 
because they are affected by the variation of the hydrophobic surface of a protein. This 
surface is equal to the protein's solvent accessible surface area (SASA) of the hydrophobic 
amino acids. The number of interacting molecules is proportional to SASA and include only 
the molecules from the first protein's solvation shell. This area depends on the conformation 



9 

of the protein. The main contribution to the energy of the system caused by the variation 
of the protein's SASA associated with the side-chain radicals of amino acids because the 



contribution to the free energy assosiated with solvation of protein's backbone is small 31 ]. 
Thus, in this work we pay the main attention to the accounting for the SASA change arising 
due to the solvation of side chain radicals. 

We treat all water molecules as statistically independent, i.e. the energy spectra of the 
states of a given molecule and its vibrational frequencies do not depend on a particular 
state of all other water molecules. Thus, the partition function of the whole system Z can 
be factorized and reads as: 

z = Y.z^z]^z^\ (10) 

where £ is the total number of states of a protein, Z s is the partition function of a water 
molecule affected by the interaction with the protein and Z w is the partition function of 
a water molecule in pure water. Y c (j) is the number of water molecules interacting with 
the protein in the j'-th conformational state. N t is the total number of water molecules in 
the system. To simplify the expressions we do not account for water molecules that do not 
interact with the protein in any of its conformational states, i.e. N t = max j{Y c (j)}. 



To construct the partition function of water we follow the formalism developed in [20 1 
and refer only to the most essential details of that work. The partition function of a water 
molecule in pure water reads as: 

4 

Z w = Y,fofieM-Ei/kT)}, (11) 

1=0 

where the summation is performed over 5 possible states of a water molecule (the states in 
which water molecule has 4,3,2,1 or hydrogen bonds with the neighboring molecules). E\ 
are the energies of these states and £; are the combinatorial factors being equal to 1,4,6,4,1 
for / = 0, 1, 2, 3, 4, respectively. They describe the number of choices to form a given number 
of hydrogen bonds. /; in Eq. (TTTj) describes the contribution due to the partition function 
arising to to the translation and libration oscillations of the molecule. In the harmonic 
approximation f\ are equal to: 



,cn 



(L) 



fi= l-expi-hvf'/kT) 1 - exp(-huf } /kT) , (12) 
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Number of hydrogen bonds 12 3 4 

Energy level, E t (kcal/mol) 6.670 4.970 3.870 2.030 

Energy level, Ef (kcal/mol) 6.431 4.731 3.631 1.791 -0.564 

(T) _i 

Translational frequencies, v\ , cm 26 86 61 57 210 

Librational frequencies, v\ , cm -1 197 374 500 750 750 



120 



TABLE I. Parameters of the partition function of water according to j20l ] 

(T) (L) 

where v\ and v) are translation and libration motions frequencies of a water molecule 



in its /-th state, respectively. These frequencies are calculated in Ref. [20|] and are given in 
Table [H The contribution of the internal vibrations of water molecules is not included in 
Eq. ( ITT1) because the frequencies of these vibrations are practically not influenced by the 
interactions with surrounding water molecules. 

The partition function of a water molecule from the protein's first solvation shell reads 
as: 

4 

Z s = Y,iMieM-E?/kT)], (13) 

1=0 

where fi are defined in Eq. (1T2J) and Ef denotes the energy levels of a water molecule 
interacting with aliphatic hydrocarbons of protein's amino acids. Values of energies Ef are 
given it Table [H For simplicity we treat all side-chain radicals of a protein as aliphatic 
hydrocarbons because most of the protein's hydrophobic amino acids consist of aliphatic- 
like hydrocarbons. It is possible to account for various types of side chain radicals by 
using the experimental results of the measurements of the solvation free energies of amino 



acid radicals from Ref. 32] and associated works. However, this correction will imply the 



reparametrization of the theory presented in 20| and will lead to the introduction of ~ 20-5 



additional parameters. Here we do not perform such a task since this kind of improvement 
of the theory would smear out the understanding of the principal physical factors underlying 
the protein foldingOunfolding transition. 

In our theoretical model we also account for the electrostatic interaction of protein's 
charged groups with water. The presence of electrostatic field around the protein leads to 
the reorientation of H2O molecules in the vicinity of charged groups due to the interaction 
of dipole moments of the molecules with the electrostatic field. The additional factor arising 
in the partition function ffTTT) of the molecules reads as: 



11 



M±/exp(-^^)si„ M *rf, (14 > 

where E is the strength of the electrostatic field, d is the absolute value of the H 2 molecule 
dipole moment, a is the ratio of the number of water molecules that interact with the 
electrostatic field of the protein (Ne) to the number of water molecules interacting with 
the surface of the amino acids from the inner part of the protein while they are exposed to 
water when the protein is being unfolded (N w ), i.e. a = Ne/N w . Note that the effects of 
electrostatic interaction turn out to be more pronounced in the folded state of the protein. 
This happens because in the unfolded state of a protein opposite charges of amino acid's 
radicals are in average closer in space due to the flexibility of the backbone chain, while in 
the folded state the positions of the charges are fixed by the rigid structure of a protein. 

Integrating Eq. ( I14p allows to write the factor Ze for the partition function of a single 
H2O molecule in pure water in the form: 

'A;Tsinh[f^] 



Ed 



(15) 



This equation shows how the electrostatic field enters the partition function. In general, 
E depends on the position in space with respect to the protein. However, here we neglect this 
dependence and instead we treat the parameter E as an average, characteristic electrostatic 
field created by the protein. 

Let us denote by N s the number of water molecules interacting with the proteins surface 
in its folded state i.e. N t = N s + N w ; where N t is defined in Eq (TTOl) . We assume that the 
number of water molecules interacting with the protein (Y c ) is linearly dependent on the 
number of amino acids being in the unfolded conformation, i.e. Y c = N s + iN w /a, where i 
is the number of the amino acids in the unfolded conformation and a is the total number 
of amino acids in the protein. Thus, the partition function ( TTOl) with the accounting for the 
factor ( TT5|) reads as: 



Z = Z»> ■ J2 (z b Zy«Z^ /a e W (i ■ E /kT)) lW (Z u Z^) a ~ lU) , (16) 



\Z h ZZ-' a Z^'- exp [i ■ Eo/kT)^ ^ 

where i(j) denotes the number of the amino acids being in the folded conformation when the 
protein is in the j-th conformational state. Accounting for the statistical factors for amino 
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acids being in the folded and unfolded states, similarly to how it was done for the vacuum 
case (see Eq. (jSJ)), one derives from Eq. f fT6|) the following final expression: 



yayN w , V^ /t! exp (j ■ E /kT) / 7 . Nw / a7 N w /a\ i , 7 7 N w /a\a-i 



z = (z s f° x (17) 

k\ exp {i ■ Eo/kT) 
(i — (a — k))!(o — i)! 

where the term in the square brackets accounts for all statistically significant conformational 
states of the protein. 

Having constructed the partition function of the system we can evaluate with its use all 
thermodynamic characteristics of the system, such as e.g. entropy, free energy, heat capacity, 
etc. The free energy (F) and heat the capacity (c) of the system can be calculated from the 
partition function as follows: 

F(T) = -kT In Z(T), (18) 

C ( T ) = - T ^P- ( 19 ) 

In this work we analyze the dependence of protein's heat capacity on temperature and 
compare the predictions of our model with available experimental data. 

III. RESULTS AND DISCUSSION 

In this section we calculate the dependencies of the heat capacity on temperature for two 



globular proteins metmyog 



experimental data from 



22 



obin and staphylococcal nuclease and compare the results with 



23]. 



The structures of metmyoglobin and staphylococcal nuclease proteins are shown in Fig. [lj 
These are relatively small globular proteins consisting of ~150 amino acids. Under certain 
experimental conditions (salt concentration and pH) the metmyoglobin and the staphylo- 
coccal nuclease experience two foldingounfolding transitions, which induce two peaks in 
the dependency of heat capacity on temperature (see further discussion). The peaks at 
lower temperature are due to the cold denaturation of the proteins. The peaks at higher 
temperatures arise due to the ordinary foldingounfolding transition. The availability of 
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experimental data for the heat capacity profiles of the mentioned proteins, the presence of 
the cold denaturation and simple two-stage-like folding kinetics are the reasons for select- 
ing these particular proteins as case studies for the verification of the developed theoretical 
model. 




/ ^ 




FIG. 1. a) Structure of staphylococcal nuclease (PDB ID 1EYD 33J]), and b) horse heart metmyo- 
globin (PDB ID 1YMB 



34(| ). Images have been rendered using VMD program 



35J. 



A. Heat capacity of staphylococcal nuclease 



Staphylococcal or micrococcal nuclease (S7 Nuclease) is a relatively nonspecific enzyme 
that digests single-stranded and double-stranded nucleic acids, but is more active on single- 
stranded substrates 36|. This protein consists of 149 amino acids. It's structure is shown 
in Fig. UK 

To calculate the SASA of staphylococcal nuclease in the folded state the 3D struc- 
ture of the protein was taken from the Protein Data Bank [37| (PDB ID 1EYD). Using 
CHARMM27 28| forcefield and NAMD program 38| we performed the structural optimiza- 
tion of the protein and calculated SASA with the solvent probe radius 1.4 A. 

The value of SASA of the side-chain radicals in the folded protein conformation is equal 
to Sf =6858 A 2 . In order to calculate SASA for an unfolded protein state, the value of all 
angles <p and ip were put equal to 180°, corresponding to a fully stretched conformation. 
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Then, the optimization of the structure with the fixed angles </? and ip was performed. 
The optimized geometry of the stretched molecule has a minor dependence on the value 
of dielectric susceptibility of the solvent, therefore the value of dielectric susceptibility was 
chosen to be equal to 20, in order to mimic the screening of charges by the solvent. SASA 
of the side-chain radicals in the stretched conformation of the protein is equal to S u =15813 

A 2 . 

The change of the number of water molecules those interacting with the protein due to 
the unfolding process can be calculated as follows: 

N w = (S u - S f )n 2 '\ (20) 

where S u = 15813 A 2 and Sf = 6858 A 2 are the SASA of the protein in unfolded and in 
folded conformations, respectively and n is the density of the water molecules. The volume 
of one mole of water is equal to 18 cm 3 , therefore n ~ 30 A~ 3 

To account for the effects caused by the electrostatic interaction of water molecules with 
the charged groups of the protein it is necessary to evaluate the strength of the average 
electrostatic field E in Eq. ( TT5l) . The strength of the average field can be estimated as 
E ■ d = kT, where d is the dipole moment of a water molecule, k is Bolzmann constant and 
T=300 K is the room temperature. According to this estimate the energy of characteristic 
electrostatic interaction of water molecules is equal to the thermal energy per degree of 
freedom of a molecule. 

The total number of water molecules that interact with the electrostatic field of the 
protein can be estimated from the known Debye screening length of a charge in electrolyte 
as follows: 



Ne = N q ^f\l (21) 

where N q is the number of charged groups in the protein, p is the density of water and A 



is the Debye screenin 
calculated as follows 



; length. Debye screening length of the symmetric electrolyte can be 



Xd = \l^b r r (22) 
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where e is the permittivity of free space, e is the dielectric constant, A4 is the Avogadro 
number, e is the elementary charge and / is is the ionic strength of the electrolyte. 

The experiments on denaturation of staphylococcal nuclease and metmyoglobin were per- 
formed in 100 niM ion buffer of sodium chloride and lOmM buffer of sodium acetate respec- 



tively [22l. l23| . The Debye screening length in water with 10 mM and 100 mM concentration 
of ions is A^ =30 A and A^ =10 A at room temperature respectively. 

The described method allows to estimate the number of water molecules (./Ye) interacting 
with electric filed created by the charged groups of a protein. It should be considered as 
qualitative estimate since we have assumed the average electric field as being constant within 
a sphere of the radius A^, but in fact it experiences some variations. Thus, at the distances 
~15 A from the point charge the interaction energy of a H 2 molecule with the electric field 
becomes equal to ~ 0.02 kT (for this estimate we have used the linear growing distance- 



dependent dielectric susceptibility e = QR as derived in [40] for the atoms fully exposed to 
the solvent). However, we expect that the more accurate analysis accounting for the spatial 
variation of the electric field will not change significantly the results of the analysis reported 
here, because it is based on the physically correct picture of the effect and the realistic 
values of all the physical quantities. At physiological conditions staphylococcal nuclease has 



8 charged residues [4l|. The value of a for this protein varies within the interval from 1.29 
to 31.27 for Xd G[10..30] A. In our numerical analysis we have used the characteristic value 
of a equal to 2.5. 

Note that number of molecules interacting with the electrostatic field Ne and the strength 
of the electrostatic field E should be considered as the effective parameters of our model. 
In this work we do not perform accurate accounting for the spatial dependence of the elec- 
trostatic field. Instead, we introduce the parameters a and E that can be interpreted as 
effective values of the number of H 2 molecules and the strength of the electrostatic field 
correspondingly. Let us stress that the number of water molecules a and the strength of the 
field E are not independent parameters of our model because by choosing the higher value 
of E and smaller value of a or vice versa one can derive the same heat capacity profile. 

In this work we do not investigate the dependencies of the heat capacity profiles on the 
values of the parameters a and E. Below we focus on the investigation of the dependence 
of the protein heat capacity on the energy E at the fixed value of a and E equal to 2.5 and 
0.58 kcal/mol respectively. 
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pH value 7.0 5.0 4.5 3.88 3.23 

E (kcal/mol) 0.789 0.795 0.803 0.819 0.890 

TABLE II. Values of Eq for staphylococcal nuclease at different values of pH of the solvent 



An important parameter of the model is the energy difference between the two states 
of the protein normalized per one amino acid, E introduced in Eq. fl6]). This parameter 
describes both the energy loss due to the separation of the hydrophobic groups of the protein 
which attract in the native state of the protein due to Van-der-Waals interaction and the 
energy gain due to the formation of Van-der-Waals interactions of hydrophobic groups of 
the protein with H 2 molecules in the protein's unfolded state. Also, the difference of 
the electrostatic energy of the system in the folded and unfolded states is accounted for in 
E . The difference of the electrostatic energy may depend on various characteristics of the 
system, such as concentration of ions in the solvent and its pH, on the exact location of the 
charged sites in the native conformation of the protein and on the probability distribution 
of distances between charged amino acids in the unfolded state. Thus, exact calculation of 
E is rather difficult. It is a separate task which we do not intend to address in this work. 
Instead, in the current study the energy difference between the two phases of the protein 
is considered as a parameter of the model. We treat E as being dependent on external 
properties of the system, in particular on the pH value of the solution. 

Another characteristic of the protein foldingounfolding transition is its cooperativity. 
In the model it is described by the parameter k in Eq. ()1]). n describes the number of 
amino acids in the flexible regions of the protein. The staphylococcal nuclease possesses a 
prominent two-stage folding kinetics, therefore only 5-10% of amino acids is in the protein's 
flexible regions. Thus, the value of k for this protein is small. It can be estimated as being 
equal to 149 • 7% ~ 10 amino acids. 

The values of Eq for staphylococcal nuclease at different values of pH are given in Table HT1 

For the analysis of the variation of the thermodynamic properties of the system during 
the folding process one can omit all the contributions to the free energy of the system that 
do not alter significantly in the temperature range between -50°C and 150°. Therefore, from 
the expression for the total free energy of the system F we can subtract all slowly varying 
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contributions F as follows: 

5F = F - F = -(kT In Z - kTlnZo) = -fcTln ( — J, (23) 

(24) 

From Eq. (123]) follows that the subtraction of F corresponds to the division of the total 
partition function Z by the partition function of the subsystem (Z ) with slowly varying 
thermodynamical properties. Therefore, in order to simplify the expressions, one can divide 
the partition function in Eq. ( 1T7|) by the partition function of fully unfolded conformation 
of a protein (by Z®Zf w ) and by the partition function of N s free water molecules (by Z^ s ). 
Thus, Eq. ( ITT]) can be rewritten as follows: 



■&T -EiSSf^f^ri. m 



z w) I i ^_ K (i ~ (a- K))\(a-i)\ \Z U J \ Z 
With the use of Eq. ( TT9|) on can calculate the heat capacity of the system as follows: 

c(T) =A + B(T- T ) - T^p, (26) 

where the factors A and B are responsible for the absolute value and the inclination of the 
heat capacity curve respectively. These factors account for the contribution of stiff harmonic 
vibrational modes in the system (factor A) and for the unharmonic correction to these vi- 
brations (factors B and T ). The contribution of protein's stiff vibrational modes and the 
heat capacity of the fully unfolded conformation of protein is also included into these factors. 
In our numerical analysis we have adjusted the values of A, B and T in order to match 
experimental measurements. However, factors A, B and To should not be considered as 
parameters of our model since their values are not related to the thermodynamic character- 
istics of the folding-H-unfolding transition and depend not entirely on the properties of the 
protein but also on the properties of the solution, protein and ion concentrations, etc. 

In our calculations for staphylococcal nuclease we have used the values of A = 1.25 JK _1 g _1 , 
B = 6.25 • 10~ 3 JKrV 1 and T =323 °K in Eq. ([26]). 

The dependence of heat capacity on temperature calculated for staphylococcal nuclease 
at different pH values are presented in Fig. IIII Al by solid lines. The results of experimental 
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measurements form Ref. 22j are presented by symbols. From Fig. IHI Al it is seen that 
staphylococcal nuclease experience two folding-B-transitions in the range of pH between 3.78 
and 7.0. At the pH value 3.23 no peaks in the heat capacity is present. It means that 
the protein exists in the unfolded state over the whole range of experimentally accessible 
temperatures. 
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FIG. 2. Dependencies of the heat capacity on temperature for staphylococcal nuclease (see Fig.QJi) 
at different values of pH. Solid lines show results of the calculation, while symbols present experi- 
mental data from Ref. [22J. 



Comparison of the theoretical results with experimental data shows that our theoretical 
model reproduces experimental behavior better for the solvents with higher pH. The heat 
capacity peak arising at higher temperatures due to the standard folding-H-unfolding tran- 
sition is reproduced very well for pH values being in the region 4.5-7.0. The deviations at 
low temperatures can be attributed to the inaccuracy of the statistical mechanics model of 
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water in the vicinity of the freezing point. 

The accuracy of the statistical mechanics model for low pH values around 3.88 is also 
quite reasonable. The deviation of theoretical curves from experimental ones likely arise 
due to the alteration of the solvent properties at high concentration of protons or due to 
the change of partial charge of amino acids at pH values being far from the physiological 
conditions. 

Despite some difference between the predictions of the developed model and the experi- 
mental results arising at certain temperatures and values of pH the overall performance of 
the model can be considered as extremely good for such a complex process as structural 
folding transition of a large biological molecule. 

B. Heat capacity of met myoglobin 

Metmyoglobin is an oxidized form of a protein myoglobin. This is a monomeric protein 
containing a single five-coordinate heme whose function is to reversibly form a dioxygen 



adduct 42|. Metmyolobin consists of 153 amino acids and it's structure is shown in Fig. [T] 
on the right. 

In order to calculate SASA of side chain radicals of metmyoglobin exactly the same 
procedure as for staphylococcal nuclease was performed (see discussion in the previous sub- 
section). SASA in the folded and unfolded states of the protein has been calculated and is 
equal 6847 A 2 and 16926 A 2 respectively. Thus, there are 984 H2O molecules interacting 
with protein's hydrophobic surface in its unfolded state. 

The electrostatic interaction of water molecules with metmyoglobin was accounted for in 
the same way as for staphylococcal nuclease. The parameter a in Eq. (115J1 was chosen to 
be equal to 2.5. With this we derive that 10950 H 2 molecules involve in the interaction 
with the electrostatic field of metmyoglobin in its folded state. The strength of the field was 
chosen the same as for staphylococcal nuclease. 

The parameter k for metmyoglobin in Eq. @, describing the cooperativity of the 
foldingounfolding transition, differs significantly from that for staphylococcal nuclease. 
The transition in metmyoglobin is less cooperative than the transition in staphylococcal 



nuclease because metmyoglobin has intermediate partially folded states 43[]. Thus, while 



the rigid native-like core of the protein is formed, a significant fraction of amino acids in 
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pH value 



4.10 3.70 3.84 3.5 



£0 (kcal/mol) 1.128 1.150 1.165 1.2 



TABLE III. Values of Eq for metmyoglobin at different values of solvent pH. 



the flexible regions of the protein can exist in the unfolded state. We assume that 1/3 of 
metmyoglobin's amino acids are in the flexible region, i.e. the parameter k in Eq. (jlj) equal 
to 50. 

The values of Eq in Eq. ((6]) differ from that for staphylococcal nuclease and are compiled in 
Table ILTTl In our calculations for metmyoglobin we have used the values of A = 1.6 JK g , 
B = 8.25 • 10~ 3 JKrV 1 and T =323 °K in Eq. 
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FIG. 3. Dependencies of the heat capacity on temperature for horse heart metmyoglobin (see 
Fig. ffb) at different values of pH. Solid lines show the results of the calculation. Symbols present 



the experimental data from Ref. 



23] 



Solid lines in Fig. [3] show the dependence of the metmyoglobin's heat capacity on tern- 
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perature calculated using the developed theoretical model. The experimental data from 



Ref. 23] are shown by symbols. 

Metmyoglobin experiences two folding-B-unfolding transitions at the pH values exceeding 
3.5 which can be called as cold and heat denaturations of the protein. The dependence of 
the heat capacity on temperature therefore has two characteristic peaks, as seen in Fig. |3j 
Figure [3] shows that at pH lower than 3.84 metmyoglobin exists only in the unfolded state. 

The comparison of predictions of the developed theoretical model with the experimental 
data on heat capacity shows that the theoretical model is well applicable for metmyoglobin 
case as well. The good agreement of the theoretical and experimental heat capacity profiles 
over the whole range of temperatures and pH values shows that the model treats correctly 
the thermodynamics of the protein folding process. 

Our theory includes a number of parameters, namely the energy difference between two 
phases E , strength of the electrostatic field E, number of interacting H 2 molecules a, the 
parameter describing the cooperativity of the phase transition k, as well as other parameters 



introduced in Ref. [20] to treat the partition function of water. Three parameters, E, Eq 
and k, are dependent on the properties of a particular protein and on the pH of the solvent. 
We have adjusted the values of these parameters in order to reproduce the experimental 
data. All other parameters of the model describing the structure of energy levels of water 
molecules, their vibrational and librational frequencies, etc. are considered as fixed, being 
universal for all proteins. 

In spite of the model features of our approach, we want to stress that the complex behavior 
and the peculiarities in dependencies of the heat capacity on temperature are all very well 
reproduced by the developed model with only a few parameters. This was demonstrated 
for two proteins and we consider this result as a significant achievement. This fact supports 
our conclusion that the developed model can be used for the prediction of new features of 
phase transitions in various biomolecular systems. Indeed, from Figs. IIHAI and |3] one can 
extract a lot of useful information on the heat capacity profiles: the concave bending of the 
heat capacity profile for a completely unfolded protein, the temperature of the cold and heat 
denaturation, the absolute values of the heat capacity at the phase transition temperature, 
the broadening of heat capacity peaks. Another peculiarity which is well reproduced by 
our statistical mechanics model is the decrease of the heat capacity of the folded state of 
the protein in comparison with that for unfolded state and asymmetry of the heat capacity 
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peaks. 



IV. CONCLUSIONS 

We have developed a novel statistical mechanics model for the description of folding -B- unfolding 
processes in globular proteins obeying simple two- stage- like folding kinetics. The model is 
based on the construction of the partition function of the system as a sum over all statisti- 
cally significant conformational states of a protein. The partition function of each state is a 
product of partition function of a protein in a given conformational state, partition function 
of water molecules in pure water and a partition function of H2O molecules interacting with 
the protein. 

The introduced model includes a number of parameters responsible for certain physical 
properties of the system. The parameters were obtained from available experimental data 
and three of them (energy difference between two phases, cooperativity of the transition and 
the average strength of the protein's electrostatic field) were considered as being variable 
depending on a particular protein and pH of the solvent. 

We have compared the predictions of the developed model with the results of experimental 
measurements of the dependence of the heat capacity on temperature for staphylococcal 
nuclease and metmyoglobin. The experimental results were obtained at various pH of solvent. 
The suggested model is capable to reproduce well within a single framework a large number 
of peculiarities of the heat capacity profile, such as the temperatures of cold and heat 
denaturations, the corresponding maximum values of the heat capacities, the temperature 
range of the cold and heat denaturation transitions, the difference between heat capacities 
of the folded and unfolded states of the protein. 

The good agreement of the results of calculations obtained using the developed formalism 
with the results of experimental measurements demonstrates that it can be used for the 
analysis of thermodynamical properties of many biomolecular systems. Further development 
of the model can be focused on its advance and application for the description of the influence 
of mutations on protein stability, analysis of assembly and stability of protein complexes, 
protein crystallization process, etc. 
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